In this Notebook we're going to use standard learning to attempt to crack the Dogs vs Cats Kaggle competition.
We are going to downsample the images to 64x64; that's pretty small, but should be enough (I hope). Furthermore, large images means longer training time and I'm too impatient for that. ;)
Lets have plots appear inline:
In [1]:
%matplotlib inline
We're going to need os
, numpy
, matplotlib
, skimage
, theano
and lasagne
. We also want to import some layer classes and utilities from Lasagne for convenience.
In [2]:
import os, time, glob, tqdm
import numpy as np
from matplotlib import pyplot as plt
import torch, torch.nn as nn, torch.nn.functional as F
import torchvision
import skimage.transform, skimage.util
from skimage.util import montage
from sklearn.model_selection import StratifiedShuffleSplit
import cv2
from batchup import work_pool, data_source
import utils
import imagenet_classes
torch_device = torch.device('cuda:0')
We are loading images from a folder of files, so we could approach this a number of ways.
Our dataset consists of 25,000 images so we could load them all into memory then access them from there. It would work, but it wouldn't scale. I'd prefer to demonstrate an approach that is more scalable and useful outside of this notebook, so we are going to load them on the fly.
Loading images on the fly poses a challenge as we may find that the GPU is waiting doing nothing while the CPU is loading images in order to build the next mini-batch to train with. It would therefore be desirable to load images in background threads so that mini-batches of images are ready to process when the GPU is able to take one. Luckily my batchup
library can help here.
We must provide the logic for:
(sample, channel, height, width)
Join the Kaggle competition and download the training and test data sets. Unzip them into a directory of your choosing, and modify the path definitions below to point to the appropriate location.
We split the images into training and validation later on, so we call them trainval
for now.
In [3]:
TRAIN_PATH = r'E:\datasets\dogsvscats\train'
TEST_PATH = r'E:\datasets\dogsvscats\test1'
# Get the paths of the images
trainval_image_paths = glob.glob(os.path.join(TRAIN_PATH, '*.jpg'))
tests_image_paths = glob.glob(os.path.join(TEST_PATH, '*.jpg'))
Okay. We have our image paths. Now we need to create our ground truths. Luckily the filename of each file starts with either cat.
or dog.
indicating which it is. We will assign dogs a class of 1
and cats a class of 0
.
In [4]:
# The ground truth classifications are given by the filename having either a 'dog.' or 'cat.' prefix
# Use:
# 0: cat
# 1: dog
trainval_y = [(1 if os.path.basename(p).lower().startswith('dog.') else 0) for p in trainval_image_paths]
trainval_y = np.array(trainval_y).astype(np.int32)
In [5]:
# We only want one split, with 10% of the data for validation
splitter = StratifiedShuffleSplit(n_splits=1, test_size=0.1, random_state=12345)
# Get the training set and validation set sample indices
train_ndx, val_ndx = next(splitter.split(trainval_y, trainval_y))
print('{} training, {} validation'.format(len(train_ndx), len(val_ndx)))
In [6]:
MODEL_MEAN = np.array([0.485, 0.456, 0.406])
MODEL_STD = np.array([0.229, 0.224, 0.225])
TARGET_SIZE = 64
def img_to_net(img):
"""
Convert an image from
image format; shape (height, width, channel) range [0-1]
to
network format; shape (channel, height, width), standardised by mean MODEL_MEAN and std-dev MODEL_STD
"""
# (H, W, C) -> (C, H, W)
img = (img - MODEL_MEAN) / MODEL_STD
img = img.transpose(2, 0, 1)
return img.astype(np.float32)
def net_to_img(img):
"""
Convert an image from
network format; shape (sample, channel, height, width), standardised by mean MODEL_MEAN and std-dev MODEL_STD
to
image format; shape (height, width, channel) range [0-1]
"""
# (C, H, W) -> (H, W, C)
img = img.transpose(1, 2, 0)
img = img * MODEL_STD + MODEL_MEAN
return img.astype(np.float32)
def load_image(path):
"""
Load an image from a given path and convert to network format (4D tensor)
"""
# Read
img = cv2.imread(path)
# OpenCV loads images in BGR channel order; reverse to RGB
img = img[:, :, ::-1]
# Compute scaled dimensions, while preserving aspect ratio
# py0, py1, px0, px1 are the padding required to get the image to `TARGET_SIZE` x `TARGET_SIZE`
if img.shape[0] >= img.shape[1]:
height = TARGET_SIZE
width = int(img.shape[1] * float(TARGET_SIZE) / float(img.shape[0]) + 0.5)
py0 = py1 = 0
px0 = (TARGET_SIZE - width) // 2
px1 = (TARGET_SIZE - width) - px0
else:
width = TARGET_SIZE
height = int(img.shape[0] * float(TARGET_SIZE) / float(img.shape[1]) + 0.5)
px0 = px1 = 0
py0 = (TARGET_SIZE - height) // 2
py1 = (TARGET_SIZE - height) - py0
# Resize the image using OpenCV resize
# We use OpenCV as it is fast
# We also resize *before* converting from uint8 type to float type as uint8 is significantly faster
img = cv2.resize(img, (width, height))
# Convert to float
img = skimage.util.img_as_float(img)
# Convert to network format
img = img_to_net(img)
# Apply padding to get it to a fixed size
img = np.pad(img, [(0, 0), (py0, py1), (px0, px1)], mode='constant')
return img
Show an image to check our code so far:
In [7]:
plt.imshow(net_to_img(load_image(trainval_image_paths[0])))
plt.show()
Looks okay.
In [8]:
class ImageAccessor (object):
def __init__(self, paths):
"""
Constructor
paths - the list of paths of the images that we are to access
"""
self.paths = paths
def __len__(self):
"""
The length of this array
"""
return len(self.paths)
def __getitem__(self, item):
"""
Get images identified by item
item can be:
- an index as an integer
- an array of incies
"""
if isinstance(item, int):
# item is an integer; get a single item
path = self.paths[item]
return load_image(path)
elif isinstance(item, np.ndarray):
# item is an array of indices
# Get the paths of the images in the mini-batch
paths = [self.paths[i] for i in item]
# Load each image
images = [load_image(path) for path in paths]
# Stack in axis 0 to make an array of shape `(sample, channel, height, width)`
return np.stack(images, axis=0)
Now we make ArrayDataSource
instances for the training and validation sets. These provide methods for getting mini-batches that we will use for training.
In [9]:
# image accessor
trainval_X = ImageAccessor(trainval_image_paths)
train_ds = data_source.ArrayDataSource([trainval_X, trainval_y], indices=train_ndx)
val_ds = data_source.ArrayDataSource([trainval_X, trainval_y], indices=val_ndx)
In [10]:
# A pool with 4 threads
pool = work_pool.WorkerThreadPool(4)
Wrap our training and validation data sources so that they generate mini-batches in parallel background threads
In [11]:
train_ds = pool.parallel_data_source(train_ds)
val_ds = pool.parallel_data_source(val_ds)
In [12]:
class PetClassifier (nn.Module):
def __init__(self):
super(PetClassifier, self).__init__()
# First two convolutional layers: 48 filters, 3x3 convolution, 1 pixel padding
self.conv1_1 = nn.Conv2d(3, 48, kernel_size=3, padding=1)
self.conv1_2 = nn.Conv2d(48, 48, kernel_size=3, padding=1)
self.pool1 = nn.MaxPool2d(2)
# Two convolutional layers, 96 filters
self.conv2_1 = nn.Conv2d(48, 96, kernel_size=3, padding=1)
self.conv2_2 = nn.Conv2d(96, 96, kernel_size=3, padding=1)
self.pool2 = nn.MaxPool2d(2)
# Two convolutional layers, 192 filters
self.conv3_1 = nn.Conv2d(96, 192, kernel_size=3, padding=1)
self.conv3_2 = nn.Conv2d(192, 192, kernel_size=3, padding=1)
self.pool3 = nn.MaxPool2d(2)
# Two convolutional layers, 384 filters
self.conv4_1 = nn.Conv2d(192, 384, kernel_size=3, padding=1)
self.conv4_2 = nn.Conv2d(384, 384, kernel_size=3, padding=1)
self.pool4 = nn.MaxPool2d(2)
# Two convolutional layers, 384 filters
self.conv5_1 = nn.Conv2d(384, 384, kernel_size=3, padding=1)
self.conv5_2 = nn.Conv2d(384, 384, kernel_size=3, padding=1)
self.pool5 = nn.MaxPool2d(2)
# Size at this point will be 384 channels, 2x2
self.fc6 = nn.Linear(384 * 2 * 2, 256)
self.drop = nn.Dropout()
self.fc7 = nn.Linear(256, 2)
def forward(self, x):
x = F.relu(self.conv1_1(x))
x = self.pool1(F.relu(self.conv1_2(x)))
x = F.relu(self.conv2_1(x))
x = self.pool2(F.relu(self.conv2_2(x)))
x = F.relu(self.conv3_1(x))
x = self.pool3(F.relu(self.conv3_2(x)))
x = F.relu(self.conv4_1(x))
x = self.pool4(F.relu(self.conv4_2(x)))
x = F.relu(self.conv5_1(x))
x = self.pool5(F.relu(self.conv5_2(x)))
x = x.view(x.shape[0], -1)
x = F.relu(self.fc6(x))
x = self.drop(x)
x = self.fc7(x)
return x
In [13]:
# Build it
pet_net = PetClassifier().to(torch_device)
In [14]:
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(pet_net.parameters(), lr=1e-3)
In [15]:
NUM_EPOCHS = 50
BATCH_SIZE = 128
The training loop:
In [16]:
print('Training...')
for epoch_i in range(NUM_EPOCHS):
t1 = time.time()
# TRAIN
pet_net.train()
train_loss = 0.0
n_batches = 0
# Ask train_ds for batches of size `BATCH_SIZE` and shuffled in random order
for i, (batch_X, batch_y) in enumerate(train_ds.batch_iterator(batch_size=BATCH_SIZE, shuffle=True)):
t_x = torch.tensor(batch_X, dtype=torch.float, device=torch_device)
t_y = torch.tensor(batch_y, dtype=torch.long, device=torch_device)
# Clear gradients
optimizer.zero_grad()
# Predict logits
pred_logits = pet_net(t_x)
# Compute loss
loss = loss_function(pred_logits, t_y)
# Back-prop
loss.backward()
# Optimizer step
optimizer.step()
# Accumulate training loss
train_loss += float(loss)
n_batches += 1
# Divide by number of samples to get mean loss
train_loss /= float(n_batches)
# VALIDATE
pet_net.eval()
val_loss = val_err = 0.0
# For each batch:
with torch.no_grad():
for batch_X, batch_y in val_ds.batch_iterator(batch_size=BATCH_SIZE, shuffle=False):
t_x = torch.tensor(batch_X, dtype=torch.float, device=torch_device)
# Predict logits
pred_logits = pet_net(t_x).detach().cpu().numpy()
pred_cls = np.argmax(pred_logits, axis=1)
val_err += (batch_y != pred_cls).sum()
# Divide by number of samples to get mean loss and error
val_err /= float(len(val_ndx))
t2 = time.time()
# REPORT
print('Epoch {} took {:.2f}s: train loss={:.6f}; val err={:.2%}'.format(
epoch_i, t2 - t1, train_loss, val_err))
In [17]:
# Number of samples to try
N_TEST = 15
# Shuffle test sample indcies
rng = np.random.RandomState(12345)
test_ndx = rng.permutation(len(tests_image_paths))
# Select first `N_TEST` samples
test_ndx = test_ndx[:N_TEST]
for test_i in test_ndx:
# Load the image
X = load_image(tests_image_paths[test_i])
with torch.no_grad():
t_x = torch.tensor(X[None, ...], dtype=torch.float, device=torch_device)
# Predict class probabilities
pred_logits = pet_net(t_x)
pred_prob = F.softmax(pred_logits, dim=1).detach().cpu().numpy()
# Get predicted class
pred_y = np.argmax(pred_prob, axis=1)
# Get class name
pred_cls = 'dog' if pred_y[0] == 1 else 'cat'
# Report
print('Sample {}: predicted as {}, confidence {:.2%}'.format(test_i, pred_cls, pred_prob[0,pred_y[0]]))
# Show the image
plt.figure()
plt.imshow(net_to_img(X))
plt.show()